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Abstract —Collaborative Representation Classification (CRC) 
for face recognition attracts a lot attention recently due to 
its good recognition performance and fast speed. Compared 
to Sparse Representation Classification (SRC), CRC achieves a 
comparable recognition performance with 10-1000 times faster 
speed. In this paper, we propose to ensemble several CRC models 
to promote the recognition rate, where each CRC model uses 
different and divergent randomly generated biologically-inspired 
features as the face representation. The proposed ensemble 
algorithm calculates an ensemble weight for each CRC model 
that guided by the underlying classification rule of CRC. The 
obtained weights reflect the confidences of those CRC models 
where the more confident CRC models have larger weights. 
The proposed weighted ensemble method proves to be very 
effective and improves the performance of each CRC model 
significantly. Extensive experiments are conducted to show the 
superior performance of the proposed method. 

Index Terms —Face recognition, Collaborative Representation 
Classification, Biologically inspired feature, Ensemble classifier 


I. Introduction 

Face recognition is one of the hottest research topics in 
computer vision due to its wide range of applications, from 
public security to personal consumer electronics. Although sig- 
nicicant improvement has been achieved in the past decades, 
a reliable face recognition system for real life environments 
is still very challenging to build due to the large intra-class 
facial variations, such as expression, illumination, pose, aging 
and the small inter-class facial differences m. 

For a face recognition system, face representation and clas¬ 
sifier construction are the two key factors, face representation 
can be divided into two categories: holistic feature based 
and local feature based. Principle Component Analysis (PCA) 
based Eigenface 0 and Linear Discriminative Analysis (LDA) 
based Fisherface 11 are the two most famous holistic face 
representations. PCA projects the face image into a subspace 
such that the most variations are kept, which is optimal 
in terms of face reconstruction. LDA considers the label 
information of the training data and linearly projects face 
image into a subspace such that the ratio of the between- 
class scatter over the within-class scatter is maximized. Both 
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PCA and LDA projects the face image into a low dimensional 
subspace on which the classification is easier. It is based on 
an assumption that the high dimensional face images lie on 
a low dimensional subspace or sub-manifold. Therefore, it is 
beneficial to first project the high dimensional face image into 
that low dimensional subspace to extract the main structure 
of the face data and reduce the impact of the unimportant 
factors, such as illumination changes. Many other holistic 
face representations have been proposed later, including Local¬ 
ity Preserving Projection (LPP) ED, Independent Component 
Analysis (ICA) Q, Local Discriminant Embedding (LDE) J6|, 
Neighborhood Preserving Embedding (NPE) 13 , Maximum 
margin criterion (MMC) (8) and so on. 

The holistic face representation is known to be sensitive to 
expression, illumination, occlusion, noise and other local dis¬ 
tortions. The local face representation which extracts features 
by using local information is shown to be more robust against 
those factors. The most commonly used local features in face 
recognition include Local Binary Pattern (LBP) 13 , Gabor 
Wavelets ifTol . Scale-Invariant Feature Transform (SIFT) fill . 
Histogram of Oriented Gradients (HOG) 1121 and so on. 

To classify the extracted representations of faces into correct 
classes, a classier needs to be constructed. Many classifiers 
have been proposed and the most widely used classifier is the 
Nearest neighbor classifier (NN) and it is improved by Nearest 
Feature Line (NFL) ED, Nearest Feature Plane (NFP) 01 
and Nearest Feature Space (NFS) na in different ways. 
Recently, Sparse Representation Classification (SRC) ED is 
proposed and shows good recognition performance and is 
robust to random pixel noise and occlusion. SRC codes the 
test sample as a sparse linear combination of all training 
samples by exposing an C-norm constraint on the resulting 
coding coefficients. The l\ -norm constraint is very expensive 
which is the main obstacle of applying SRC in large scale 
face recognition systems. Lately, Collaborative Representation 
Classification (CRC) QU is proposed which achieves compa¬ 
rable performance to SRC and has a much faster recognition 
speed. The author in lff6l finds that it is the collaborative 
representation not the l \ -norm constraint that is important in 
the classification process. By replacing the slow l \ - norm with 
a much fast ( 2 -norm constraint, CRC codes each test sample 
as a linear combination of all the training faces with a closed- 
form solution. As a result, CRC can recognize a test sample 
10-1000 times faster than SRC as shown in QU- 

In this paper, we propose to ensemble several CRCs to boost 
the performance of CRC. Each CRC is a weak classifier are 
combined to construct the strong classifier named ensemble- 
CRC. For each test sample, several different face represen- 
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tations are extracted. Then, severl CRCs are used to make 
the classification using those face representations. A weight 
is then calculated and assigned to each CRC by considering 
the reconstruction residue characteristics. By analyzing the 
magnitude relationship between reconstruction residues of 
different classes, the highly correct CRC can be identified. 
Large weights are assigned to those highly correct CRCs and 
small weights are assigned to the rest CRCs. Finally, the 
classification is obtained by a weighted combination of the 
reconstruction residues of all CRCs. 

One key factor to the success of ensemble learning is the 
significant diversity among the weak classifiers. For example, 
if different CRC makes different errors for test samples, then, 
the combination of many CRCs tends to yield much better 
results than each CRC. To this end, some randomly gen¬ 
erated biologically-inspired face representation will be used. 
Biologically-inspired features have generated very competitive 
results in a variety of different object and face recognition 
contexts 03, COD, ED- Most of them try to build artificial 
visual systems that mimic the computational architecture of 
the brain. We use the similar model as in f20j , in which 
the author showed that the randomly generated biologically- 
inspired features perform surprisingly well, provided that 
the proper non-linearities and pooling layers are used. The 
randomly generated biologically-inspired model is shown to 
be inherently frequency selective and translation invariant 
under certain convolutional pooling architectures [|2T]]. It is ex¬ 
pected that different randomly generated biologically-inspired 
features may generate different face representations (e.g., 
corresponds to different frequencies). Therefore, the proposed 
ensemble-CRC can obtain the significant diversity which is 
highly desired. 

The rest of the paper is organized as follows. Section HU 
introduces the proposed ensemble-CRC method. Section [III] 
conducts extensive experiments to verify the effectiveness of 
ensemble-CRC. Finally, Section [Iv] concludes the paper. 

II. Proposed Method 
A. Ensemble-CRC 

First, we briefly introduce CRC. CRC codes a test sample 
using all the training samples linearly and pose an I 2 con¬ 
straints on the coding coefficients. Then, the reconstruction of 
the test sample is formed by linearly combine the training 
samples from a specific class utilizing the corresponding 
coding coefficients. The test sample is classified into the class 
that has the smallest reconstruction error. 

More specifically, suppose there are n training samples from 
c different classes. For each class j = 1,2, ...c, there are re¬ 
training samples. The ?'th training sample of class j is denoted 
as Xji £ R m where m is the feature’s dimensionality. Let A = 
[A\, A2, ..., A c ] £ R mxra be the set of entire training samples, 
where Aj = [xji, Xj%, ..., Xj nj ] £ R mxn t is composed of 
training samples from class j. For a given test sample y, CRC 
solves the following problem 


where A is the regularization parameter. The solution of the 
above problem can be obtained analytically as 

a = (A T A + XI)~ 1 A T y. (2) 

Let P = {A T A + A I)~ 1 A T . It can be seen that P is 
independent of the test sample y and can be pre-calculated. 
For each test sample, we only need simply project y onto P 
to obtain the coding coefficients. To make the classification of 
y, the reconstruction of y by each class should be calculated. 
For each class j, let S :] : R" —» R" be the characteristic 
function that keeps the coefficients of class j and assigns 
the coefficients associated with other class to be 0. The 
reconstruction of y by the class j is obtained as i/j = A5j(a). 
The reconstruction error of class j is obtained by 

e i = \\y - Ml = \\y - A5j(a)\\l (3) 

CRC classifies y into the class that has minimum recon¬ 
struction error. 

The proposed ensemble CRC utilizes multiple CRCs and 
combines them together to obtain a final classification. Assume 
there are k different face representations extracted from each 
face, and k training set can be formed as A [ ..... A k and 
A k = [A k , A k ,..., A k ] £ R mxn . Then, k projection matrix 
P 1 ,..., P k can be obtained using A 1 ,..., A k . For a test sample 
y, k different representations are extracted and denoted as 
y 1 , ..., y k . For each set of ( y k , P k , A k ), the coding coefficients 
a k can be obtained using Equation (0 and the corresponding 
reconstruction errors e k can be obtained using Equation 0. 

Different face representation has different performance for 
a particular test sample, therefore, proper weights should be 
assigned to different CRCs given the test sample. Notice 
that CRC determines the class of the test sample by se¬ 
lecting the minimum classification error. If the correct class 
produces small reconstruction error and all other incorrect 
classes produce large reconstruction errors, CRC makes cor¬ 
rect classification easily in this situation. However, when some 
incorrect classes produce similar or smaller reconstruction 
error compared with the correct class, CRC may make wrong 
classification in this situation. In the latter situation, the 
reconstruction error of the correct CRC is usually among 
the several small reconstruction errors. In summary, CRC has 
high fidelity of correct classification when there is only one 
small reconstruction error and CRC has low fidelity of cor¬ 
rect classification when there are several small reconstruction 
errors. We utilize this observation to guide the calculation of 
the weights. For each representation, the smallest (denoted as 
e s ) and the second smallest (denoted as e ss ) reconstruction 
errors are picked, then the difference value between the two 
reconstruction errors is calculated as d = e ss — e s . Each 
representation has its difference value and k difference values 
can be obtained as d 1 ,...,d k . Then, the weight for the fcth 
CRC can be calculated as 


a = arg min{||y - Aa\\% + A||o;|||}, 


( 1 ) 


d 1 + d 2 + ... + d k 


(4) 
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It is obvious that the larger the difference, the larger the 
weight. After obtaining all the weight, the reconstruction error 
of class j is calculated as 

ej = w 1 * ej + w 2 * e| + ... + w k * e k . (5) 

The ensemble-CRC will assign the test sample into the class 
where the combined reconstruction error has minimum value. 

B. Randomly Generated Biologically-Inspired Feature 

The biologically-inspired features used in the proposed 
ensemble-CRC are similar in form as the biologically-inspired 
features in (20). The feature extraction process includes four 
layers: filter bank layer, rectification layer, local contrast 
normalization layer and pooling layer. Different Biologically- 
inspired features can be obtained by modifying the structure 
of the extraction process or using different model parameters. 
The details of each layer are introduced in the following. 

« Filter bank layer. The input image is convolved with a 
certain number of filters. Assume the input image x has 
size n\Xn 2 and each filter k has size l\Xl 2 , the convolved 
output (or feature map) y will have size n\ — li + lx 
ri 2 — l 2 + 1- The output can be computed as 

y = g x tanh(k <8> x) (6) 

where C3> is the convolve operation, tanh is the hyperbolic 
tangent non-linearity function and g is a gain factor. 

• Rectification layer. This layer simply applies the absolute 
function to the output of the filter bank layer as y = \y\. 

• Local contrast normalization layer. Local subtractive and 
divisive normalization are performed which enforces the 
local competition between adjacent features in a feature 
map. More details can be found in (22). 

• Pooling layer. The pooling layer transforms the joint 
feature representation into a more robust feature which 
achieves invariance to transformations, clutter and small 
distortions. Max pooling and average pooling can be 
used. For max pooling, the max value of a small non¬ 
overlapping region in the feature map is selected. All 
other features in this small local region are discarded. The 
average pooling returns the average value of the small 
local region in the feature map. After pooling, the number 
of feature in feature maps are reduced. The reduction ratio 
is determined by the size of the local region. 

it is shown in l20ll that the filters in the filter bank layer can 
be assigned with small random values and the obtained ran¬ 
domly generated features still achieve very good recognition 
performance in several image classification benchmark data 
sets. 

The reason that we select the randomly generated 
biologically-inspired features in the proposed ensemble-CRC 
is twofold. First, it performs well in many different visual 
recognition problems, and second, the randomness in it pro¬ 
vides some diverseness. It is shown that a necessary and 
sufficient condition for an ensemble of classifier to be more 
accurate than any of its individual members is if the classifiers 
are accurate and diverse l23l . 



(a) AR (b) LFW 


Fig. 2. The sample face images of AR and LFW databases 

C. The Complete Recognition Process 

The complete recognition process for a test face image is 
shown in Fig. |T] The input face image is first convolved with k 
filters and then transformed non-linearly. As a result, k feature 
maps are obtained, which are then rectified and normalized. 
Then, pooling is used to extract the salient features and reduce 
the feature map’s size. Because the extract feature maps still 
have big size, we transform the 2-D feature maps into 1-D 
vectors and use PCA to reduce the dimensionality. After PCA, 
k feature maps are transformed into k face representations with 
reduced dimensionality. Up to now, we finish the extraction of 
different features. Next, the k extracted features are used by 
k CRCs, then, k classification results are weighted combined 
to form the final classification result. 

III. Experiment 

We compare the proposed ensemble-CRC with CRC (151 , 
AW-CRC (Adaptive and Weighted Collaborative Representa¬ 
tion Classification) (24) . SRC fl5l . WSRC (Weighted Sparse 
Representation Classification) 825) and RPPFE (Random Pro¬ 
jection based Partial Feature Extraction) (26). using AR (27l 
and LFW (28) face databases. 

The AR database consists of over 4,000 frontal face images 
from 126 individuals. The images have different facial ex¬ 
pressions, illumination conditions and occlusions. The images 
were taken in two separate sessions, separated by two weeks 
time. In our experiment, we choose a subset of the AR 
database consisting of 50 male subjects and 50 female subjects 
and crop image into the size of 64 x 43. For each subject, the 
seven images with only illumination change and expressions 
from Session one are used for training. The seven images with 
only illumination change and expressions from Session two are 
used for testing. 

The Labeled Faces in the Wild (LFW) database is a very 
challenging database consists of faces with great variations 
in terms of lighting, pose, expression and age. It contains 
13,223 face images from 5,749 persons. LFW-a is a subset 
of LFW that the face images are aligned using a commercial 
face alignment software. We adopt the same experiment setting 
in (29ll . In detail, 158 subjects in LFW-a that have no less than 
10 images are chosen. For each subject, 10 images are selected 
in the experiment. Thus, there are in total 1,580 images used 
in our experiment. Each image is first cropped to 121 x 121 
and then resized to the size of 32 x 32. Five images are used 
for training and the other five images for testing. 

In all the following experiment, the filter size used is 
5x5, and all filters are randomly generated from a uniform 
distribution from [—0.001,0.001]. The non-linearity function 
used is /(a) = 1.7159fan/i(0.6667a) as in flTl . The pooling 
used is max pooling with size 2x2. 
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Convolutions and non-linearity Rectification Contrast 

transform Normalization 


Fig. 1. The flowchart of the recognition process of the proposed ensemble-CRC. 



A. Number of CRCs in Ensemble-CRC 

The number of weak classifiers in an ensemble classifier is 
very important to the performance of the ensemble classifier. 
The increase of the number of weak classifiers improve 
the performance of the ensemble classifier at first, but the 
performance of the ensemble classifier may degrade when 
too many weak classifiers are used. Also, the more the weak 
classier, the more the computation is needed. Next, we conduct 
several experiments on AR database to show the huge impact 
of the number of weak classifiers and try to find the best 
number experimentally. 

We test the number of weak classifier from 1 to 128 and the 
dimension after PCA is set as 300. We repeat the experiment 
10 times and the average result is reported in Fig. [3] It can 
be seen that the recognition rate is 92.4% when only one 
CRC is used. With eight CRCs included in ensemble-CRC, 
the performance increases rapidly to 97.1%. When 64 CRCs 
are used in ensemble-CRC, the performance is around 98%, 
and more CRCs do not improve the performance further. We 
conclude that 64 CRCs seem to be the best number of weak 
classifiers. All the rest experiments thus use 64 CRCs in 
ensemble-CRC. 

B. Weighted VS. Non-Weiglited Ensemble-CRC 

In the proposed ensemble-CRC, a weight is calculated for 
each CRC. The weights can all be assigned to be 1, and 
the obtained ensemble-CRC can be regarded as non-weighted 
ensemble-CRC. In the following, we compare the perfor¬ 
mance of the proposed weighted ensemble-CRC and the non- 
weighted ensemble-CRC on AR database, using the feature 



Fig. 4. The performance comparison of the proposed weighted ensemble-CRC 
and the non-weighted ensemble-CRC 

dimension of 100. Fig. |4] shows that the weighted ensemble- 
CRC consistently outperforms the non-weighted ensemble- 
CRC. 

C. Performance Comparison With Other Methods 

In the following, the proposed ensemble-CRC is compared 
with CRC, AW-CRC, SRC, WSRC and RPPFE. Different 
feature dimensions are compared for each database as shown 
in Fig. [3 For AR database, ensemble-CRC achieves the recog¬ 
nition rate of 91.85% with feature dimension of 50, which 
is 12.88% higher than that of CRC (78.97), 10.73% higher 
than that of AW-CRC (81.1%), 8.87% higher than that of 
SRC(82.98%), 9.02% higher than that of WSRC(82.83%) and 
19.79% higher than that of RPPFE(72.06%). With the increase 
of the dimension, the performance of ensemble-CRC, CRC, 
AW-CRC, SRC, WSRC and RPPFE all increase gradually. 
The highest recognition rate of ensemble-CRC, CRC, AW- 
CRC, SRC, WSRC and RPPFE are 98.10%, 93.84%, 93.99%, 
92.99%, 93.13% and 95.84% respectively. It is clear that the 
proposed ensemble-CRC outperforms all other methods. 

The LFW database is quite difficult. The highest recognition 
rate obtained by CRC, AW-CRC, SRC, WSRC and RPPFE 
is 33.67%, 36.32%, 35.95% and37.97%, which are much 
lower than that of AR database. The proposed ensemble-CRC 
achieves the highest recognition rate of 48.77% which is much 
higher than that of CRC, AW-CRC, SRC, WSRC and RPPFE. 
Due to the pooling operation, the dimension for each randomly 
generated biologically-inspired feature is constrained to be 
190. However, the recognition rate may be higher if higher 
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Fig. 5. The performance comparison of the proposed ensemble-CRC with 
CRC and AW-CRC on AR and LFW database. 

dimension of randomly generated biologically-inspired feature 
can be used (e.g., larger input image size), which can be 
inferred from the recognition rate curve of ensemble-CRC. 

IV. Conclusion 

In this paper, a novel face recognition algorithm named 
ensemble-CRC is proposed. Ensemble-CRC utilizes the ran¬ 
domly generated biologically-inspired feature to create many 
high-performance and diverse CRCs which are combined 
using a weighted manner. The experimental result shows that 
the proposed ensemble-CRC outperforms the CRC, AW-CRC, 
SRC, WSRC and RPPFE. 
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